Overview

Dataset statistics

Number of variables12
Number of observations7669950
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory702.2 MiB
Average record size in memory96.0 B

Variable types

NUM6
CAT6

Warnings

user_id has a high cardinality: 228554 distinct values High cardinality
date_from has a high cardinality: 7205513 distinct values High cardinality
date_until has a high cardinality: 7194377 distinct values High cardinality
start_station_name has a high cardinality: 208 distinct values High cardinality
end_station_name has a high cardinality: 208 distinct values High cardinality
booked_via has a high cardinality: 223 distinct values High cardinality
date_from is uniformly distributed Uniform
date_until is uniformly distributed Uniform
df_index has unique values Unique
distance_in_km has 293637 (3.8%) zeros Zeros

Reproduction

Analysis started2021-03-13 12:19:10.383402
Analysis finished2021-03-13 12:26:04.082556
Duration6 minutes and 53.7 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct7669950
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8056972.421
Minimum0
Maximum16228295
Zeros1
Zeros (%)< 0.1%
Memory size58.5 MiB
2021-03-13T13:26:09.367543image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile643750.45
Q13943278.25
median8024147.5
Q312252320.5
95-th percentile15458150.55
Maximum16228295
Range16228295
Interquartile range (IQR)8309042.25

Descriptive statistics

Standard deviation4774769.999
Coefficient of variation (CV)0.5926258338
Kurtosis-1.220860839
Mean8056972.421
Median Absolute Deviation (MAD)4152774
Skewness0.01473336482
Sum6.179657562e+13
Variance2.279842854e+13
MonotocityStrictly increasing
2021-03-13T13:26:09.535377image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01< 0.1%
 
139304351< 0.1%
 
140041991< 0.1%
 
98078461< 0.1%
 
56196851< 0.1%
 
139960031< 0.1%
 
97259181< 0.1%
 
55377571< 0.1%
 
139058791< 0.1%
 
97013301< 0.1%
 
Other values (7669940)7669940> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
162282951< 0.1%
 
162282931< 0.1%
 
162282911< 0.1%
 
162282901< 0.1%
 
162282891< 0.1%
 

bike_id
Real number (ℝ≥0)

Distinct2681
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean121288.1916
Minimum106022
Maximum143866
Zeros0
Zeros (%)0.0%
Memory size58.5 MiB
2021-03-13T13:26:09.746776image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum106022
5-th percentile108385
Q1116498
median119919
Q3120512
95-th percentile143722
Maximum143866
Range37844
Interquartile range (IQR)4014

Descriptive statistics

Standard deviation10746.43914
Coefficient of variation (CV)0.08860251765
Kurtosis0.4183371185
Mean121288.1916
Median Absolute Deviation (MAD)893
Skewness1.144872394
Sum9.302743654e+11
Variance115485954.2
MonotocityNot monotonic
2021-03-13T13:26:09.978283image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14361948720.1%
 
11625148680.1%
 
11968548000.1%
 
10848347920.1%
 
11988247440.1%
 
14369447270.1%
 
12000047230.1%
 
11961247210.1%
 
10750947040.1%
 
10735347000.1%
 
Other values (2671)762229999.4%
 
ValueCountFrequency (%) 
10602271< 0.1%
 
1060251296< 0.1%
 
106033185< 0.1%
 
106035628< 0.1%
 
106040851< 0.1%
 
ValueCountFrequency (%) 
1438661697< 0.1%
 
143855363< 0.1%
 
1438331796< 0.1%
 
1438322477< 0.1%
 
14383143790.1%
 

user_id
Categorical

HIGH CARDINALITY

Distinct228554
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size58.5 MiB
496D35CFE3F625730E578793269E52D0A45FE53E
 
2714
6DF3E96544415EBFF474F968F264E144772F508E
 
2583
7439201395BB2E80301974D4D00100F1F8A7AFB4
 
2299
5EBBA60A2178EF837A2A2065E05B1A84C9B4FD94
 
2258
F4B0220EB708EB3C7D2966B0194FA19640B458C5
 
2167
Other values (228549)
7657929 
ValueCountFrequency (%) 
496D35CFE3F625730E578793269E52D0A45FE53E2714< 0.1%
 
6DF3E96544415EBFF474F968F264E144772F508E2583< 0.1%
 
7439201395BB2E80301974D4D00100F1F8A7AFB42299< 0.1%
 
5EBBA60A2178EF837A2A2065E05B1A84C9B4FD942258< 0.1%
 
F4B0220EB708EB3C7D2966B0194FA19640B458C52167< 0.1%
 
B55462DA30B9D64E617B92DF0A99AC509BCC461B2100< 0.1%
 
19C08F00C4101E327BF935F49D228C5398AA9F062001< 0.1%
 
63D3262EA34B00E18F9A801AE1832C618FD70D491946< 0.1%
 
D56E514389AF41CEE25EB57352A9CAC5D73710061944< 0.1%
 
BDBE0F11FE2C06152C2D97FF4B02E02D1D962C6E1938< 0.1%
 
Other values (228544)764800099.7%
 
2021-03-13T13:26:11.969043image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique26999 ?
Unique (%)0.4%
2021-03-13T13:26:12.120581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length40
Median length40
Mean length40
Min length40

date_from
Categorical

HIGH CARDINALITY
UNIFORM

Distinct7205513
Distinct (%)93.9%
Missing0
Missing (%)0.0%
Memory size58.5 MiB
2016-07-21 20:05:02
 
6
2014-07-01 00:39:04
 
6
2017-05-15 17:52:53
 
6
2015-09-08 17:39:51
 
6
2016-06-01 17:07:51
 
6
Other values (7205508)
7669920 
ValueCountFrequency (%) 
2016-07-21 20:05:026< 0.1%
 
2014-07-01 00:39:046< 0.1%
 
2017-05-15 17:52:536< 0.1%
 
2015-09-08 17:39:516< 0.1%
 
2016-06-01 17:07:516< 0.1%
 
2014-08-05 17:01:525< 0.1%
 
2016-08-14 14:29:045< 0.1%
 
2016-07-26 18:26:495< 0.1%
 
2017-03-28 18:13:375< 0.1%
 
2016-07-21 19:14:485< 0.1%
 
Other values (7205503)7669895> 99.9%
 
2021-03-13T13:27:16.174912image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique6766286 ?
Unique (%)88.2%
2021-03-13T13:27:16.326332image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length19
Median length19
Mean length19
Min length19

date_until
Categorical

HIGH CARDINALITY
UNIFORM

Distinct7194377
Distinct (%)93.8%
Missing0
Missing (%)0.0%
Memory size58.5 MiB
2017-04-02 16:05:32
 
6
2016-05-10 17:44:08
 
5
2016-06-05 15:46:28
 
5
2015-07-23 19:23:26
 
5
2014-08-13 18:57:28
 
5
Other values (7194372)
7669924 
ValueCountFrequency (%) 
2017-04-02 16:05:326< 0.1%
 
2016-05-10 17:44:085< 0.1%
 
2016-06-05 15:46:285< 0.1%
 
2015-07-23 19:23:265< 0.1%
 
2014-08-13 18:57:285< 0.1%
 
2015-04-23 17:48:135< 0.1%
 
2017-03-28 17:51:085< 0.1%
 
2016-05-27 17:50:225< 0.1%
 
2015-06-12 22:13:195< 0.1%
 
2016-09-19 17:35:275< 0.1%
 
Other values (7194367)7669899> 99.9%
 
2021-03-13T13:28:18.923324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique6745259 ?
Unique (%)87.9%
2021-03-13T13:28:19.094930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length19
Median length19
Mean length19
Min length19

start_station_name
Categorical

HIGH CARDINALITY

Distinct208
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size58.5 MiB
Allende-Platz/Grindelhof
 
157780
Schulterblatt/Eifflerstraße
 
145002
Mundsburg / Schürbeker Straße
 
112640
Goldbekplatz / Semperstraße
 
111603
Lange Reihe / Lohmühlenpark
 
108510
Other values (203)
7034415 
ValueCountFrequency (%) 
Allende-Platz/Grindelhof1577802.1%
 
Schulterblatt/Eifflerstraße1450021.9%
 
Mundsburg / Schürbeker Straße1126401.5%
 
Goldbekplatz / Semperstraße1116031.5%
 
Lange Reihe / Lohmühlenpark1085101.4%
 
Jungfernstieg / Ballindamm1084071.4%
 
Jarrestraße / Rambatzweg1046561.4%
 
Neuer Pferdemarkt / Beim Grünen Jäger1038061.4%
 
Paulinenplatz/Wohlwillstraße1018231.3%
 
Eduard-Rhein-Ufer / Schwanenwik1005461.3%
 
Other values (198)651517784.9%
 
2021-03-13T13:28:19.269950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-03-13T13:28:19.471016image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length48
Median length28
Mean length28.92616262
Min length14

start_station_id
Real number (ℝ≥0)

Distinct208
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean169794.3685
Minimum131543
Maximum268358
Zeros0
Zeros (%)0.0%
Memory size58.5 MiB
2021-03-13T13:28:19.668179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum131543
5-th percentile131641
Q1131885
median140796
Q3211706
95-th percentile244935
Maximum268358
Range136815
Interquartile range (IQR)79821

Descriptive statistics

Standard deviation41570.88457
Coefficient of variation (CV)0.2448307616
Kurtosis-1.316271791
Mean169794.3685
Median Absolute Deviation (MAD)9154
Skewness0.5420001566
Sum1.302314317e+12
Variance1728138444
MonotocityNot monotonic
2021-03-13T13:28:20.111241image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1980771577802.1%
 
1316481450021.9%
 
1407991126401.5%
 
1407961116031.5%
 
1383851085101.4%
 
1318791084071.4%
 
1383761046561.4%
 
1318901038061.4%
 
1315471018231.3%
 
1408001005461.3%
 
Other values (198)651517784.9%
 
ValueCountFrequency (%) 
131543902731.2%
 
131546529560.7%
 
1315471018231.3%
 
131639724430.9%
 
131640210950.3%
 
ValueCountFrequency (%) 
26835881< 0.1%
 
2648212964< 0.1%
 
26482066860.1%
 
2643301222< 0.1%
 
25646754010.1%
 

end_station_name
Categorical

HIGH CARDINALITY

Distinct208
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size58.5 MiB
Allende-Platz/Grindelhof
 
161211
Schulterblatt/Eifflerstraße
 
145284
Jungfernstieg / Ballindamm
 
114148
Goldbekplatz / Semperstraße
 
113991
Mundsburg / Schürbeker Straße
 
113893
Other values (203)
7021423 
ValueCountFrequency (%) 
Allende-Platz/Grindelhof1612112.1%
 
Schulterblatt/Eifflerstraße1452841.9%
 
Jungfernstieg / Ballindamm1141481.5%
 
Goldbekplatz / Semperstraße1139911.5%
 
Mundsburg / Schürbeker Straße1138931.5%
 
Lange Reihe / Lohmühlenpark1091791.4%
 
Jarrestraße / Rambatzweg1081361.4%
 
Neuer Pferdemarkt / Beim Grünen Jäger1033471.3%
 
Paulinenplatz/Wohlwillstraße1030001.3%
 
Eduard-Rhein-Ufer / Schwanenwik1021021.3%
 
Other values (198)649565984.7%
 
2021-03-13T13:28:20.337870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2021-03-13T13:28:20.533326image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length48
Median length28
Mean length28.9019579
Min length14

end_station_id
Real number (ℝ≥0)

Distinct208
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean170111.856
Minimum131543
Maximum268358
Zeros0
Zeros (%)0.0%
Memory size58.5 MiB
2021-03-13T13:28:20.733470image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum131543
5-th percentile131641
Q1131887
median140796
Q3211706
95-th percentile244935
Maximum268358
Range136815
Interquartile range (IQR)79819

Descriptive statistics

Standard deviation41577.36347
Coefficient of variation (CV)0.244411909
Kurtosis-1.327636139
Mean170111.856
Median Absolute Deviation (MAD)9154
Skewness0.5283857597
Sum1.30474943e+12
Variance1728677153
MonotocityNot monotonic
2021-03-13T13:28:20.900745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1980771612112.1%
 
1316481452841.9%
 
1318791141481.5%
 
1407961139911.5%
 
1407991138931.5%
 
1383851091791.4%
 
1383761081361.4%
 
1318901033471.3%
 
1315471030001.3%
 
1408001021021.3%
 
Other values (198)649565984.7%
 
ValueCountFrequency (%) 
131543962891.3%
 
131546532110.7%
 
1315471030001.3%
 
131639761551.0%
 
131640202420.3%
 
ValueCountFrequency (%) 
26835872< 0.1%
 
2648212968< 0.1%
 
26482066070.1%
 
2643301087< 0.1%
 
25646754970.1%
 

booked_via
Categorical

HIGH CARDINALITY

Distinct223
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size58.5 MiB
iPhone SRH
2012624 
Android SRH
1460250 
IVR
763349 
iPhone CAB
 
210597
Android CAB
 
82403
Other values (218)
3140727 
ValueCountFrequency (%) 
iPhone SRH201262426.2%
 
Android SRH146025019.0%
 
IVR76334910.0%
 
iPhone CAB2105972.7%
 
Android CAB824031.1%
 
Unknown772171.0%
 
terminal HH_93 (-2215-)653950.9%
 
Terminal HH_5 (-2132-)510940.7%
 
Terminal HH_79 (-2323-)482100.6%
 
Terminal HH_75 (-2364-)446660.6%
 
Other values (213)285414537.2%
 
2021-03-13T13:28:21.097297image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1 ?
Unique (%)< 0.1%
2021-03-13T13:28:21.267248image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length28
Median length11
Mean length14.68398999
Min length3

duration_in_min
Real number (ℝ≥0)

Distinct33
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.05734509
Minimum1
Maximum33
Zeros0
Zeros (%)0.0%
Memory size58.5 MiB
2021-03-13T13:28:21.475060image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q18
median13
Q319
95-th percentile28
Maximum33
Range32
Interquartile range (IQR)11

Descriptive statistics

Standard deviation7.37423814
Coefficient of variation (CV)0.5245825645
Kurtosis-0.4793869522
Mean14.05734509
Median Absolute Deviation (MAD)5
Skewness0.6106385271
Sum107819134
Variance54.37938814
MonotocityNot monotonic
2021-03-13T13:28:21.597041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%) 
84510095.9%
 
74482325.8%
 
94422585.8%
 
64302485.6%
 
104278105.6%
 
114080805.3%
 
123841195.0%
 
53643794.8%
 
133617304.7%
 
143383264.4%
 
Other values (23)361375947.1%
 
ValueCountFrequency (%) 
1992< 0.1%
 
296760.1%
 
31452431.9%
 
42609363.4%
 
53643794.8%
 
ValueCountFrequency (%) 
33538960.7%
 
32618950.8%
 
31728070.9%
 
30822441.1%
 
29929641.2%
 

distance_in_km
Real number (ℝ≥0)

ZEROS

Distinct13808
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.879521843
Minimum0
Maximum19.19140257
Zeros293637
Zeros (%)3.8%
Memory size58.5 MiB
2021-03-13T13:28:21.743305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.4124845333
Q10.9761789274
median1.598932269
Q32.583083313
95-th percentile4.240455163
Maximum19.19140257
Range19.19140257
Interquartile range (IQR)1.606904385

Descriptive statistics

Standard deviation1.21018557
Coefficient of variation (CV)0.6438794923
Kurtosis0.7775827794
Mean1.879521843
Median Absolute Deviation (MAD)0.742906618
Skewness0.933151183
Sum14415838.56
Variance1.464549114
MonotocityNot monotonic
2021-03-13T13:28:21.877408image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02936373.8%
 
0.6899266912353240.5%
 
0.4670319582285650.4%
 
0.7730821796231990.3%
 
0.838103895215170.3%
 
0.6455644214214380.3%
 
1.428854315203580.3%
 
0.5965872349197700.3%
 
0.8582389897195280.3%
 
1.02143741190000.2%
 
Other values (13798)716761493.5%
 
ValueCountFrequency (%) 
02936373.8%
 
0.08803222592296< 0.1%
 
0.10581565151043< 0.1%
 
0.1215029514260< 0.1%
 
0.1298428182341< 0.1%
 
ValueCountFrequency (%) 
19.191402571< 0.1%
 
18.407137981< 0.1%
 
14.95596011< 0.1%
 
14.786951031< 0.1%
 
13.784923661< 0.1%
 

Interactions

2021-03-13T13:23:34.359107image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:37.465359image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:39.096207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:40.660257image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:42.264508image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:43.989200image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:45.640859image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:47.216844image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:48.893775image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:50.600276image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:52.240794image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:53.867681image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:55.363900image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:56.912644image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:23:58.527738image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:00.145269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:01.745638image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:03.384019image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:04.875411image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:06.434149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:08.055551image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:09.651911image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:11.290455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:12.971378image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:14.475530image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:16.030426image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:17.585002image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:19.177272image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:20.885920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:23.048331image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:24.513417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:26.171323image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:28.124099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:30.048745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:32.001437image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:24:33.816420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-03-13T13:28:22.005580image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-03-13T13:28:22.188568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-03-13T13:28:22.337937image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-03-13T13:28:22.487654image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-03-13T13:25:03.760969image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-13T13:25:12.989350image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

df_indexbike_iduser_iddate_fromdate_untilstart_station_namestart_station_idend_station_nameend_station_idbooked_viaduration_in_mindistance_in_km
00143517A821059B555C7764A2FF801180874A2FCB3262222014-01-01 00:34:542014-01-01 00:50:14U-Bahn Baumwall214170Mönckebergstraße / Rosenstraße131880iPhone SRH161.293661
111198301EBC930DB407ACEAE2FDE23A6CA40492EA3DFBB22014-01-01 01:39:552014-01-01 01:57:27Bahnhof Altona Ost/Max-Brauer-Allee131646Schulterblatt/Eifflerstraße131648Android SRH182.032271
221435017AD2C1B70137479062A6DD73815835986677BB2D2014-01-01 01:40:202014-01-01 01:53:09Weidestraße/Biedermannplatz211922Jarrestraße / Rambatzweg138376Techniker HH_119 (-2334-)130.954178
341086414F4F752203EA6FC872D576E9289C4E1B362E16F62014-01-01 02:05:552014-01-01 02:13:49Mundsburg / Schürbeker Straße140799Bartholomäusstraße/Beim Alten Schützenhof211923iPhone SRH80.693159
45143829FEA7FF33A3252EE99E58B9E15724AA861CAB1DDF2014-01-01 02:29:032014-01-01 02:32:41Krausestraße/Eilbektal208295Lortzingstraße/Friedrichsberger Straße213833iPhone SRH40.645564
5614355260A788942F6A49BF54DB9013DB05428F897FCCCE2014-01-01 03:07:072014-01-01 03:20:08Winterhuder Weg/ Zimmerstraße208292Wiesendamm/Roggenkamp212607Android SRH141.977492
67120327E32FF481BF244603D691DED875AC4FBEDCF96BFB2014-01-01 03:12:502014-01-01 03:14:54Bahnhof Altona Ost/Max-Brauer-Allee131646Bahnhof Altona Ost/Max-Brauer-Allee131646Terminal HH_55 (-2121-)30.000000
79143577708275C3A732D3BD47E97F1E0AC3AE01735FA1702014-01-01 04:27:512014-01-01 04:45:18Hofweg/Am Langenzug200502Eppendorfer Weg/Hoheluftchaussee198086Android SRH182.830757
8101435804FCAC2DAFF984CC2FFC85D0B87D577D2660107452014-01-01 04:58:332014-01-01 05:12:31Löwenstraße/Eppendorfer Weg213680Heußweg/Wiesenstraße201326Techniker HH_138 (-2244-)141.775725
912119948092D25BAD64832AE3F69488573BA5C398C25B51D2014-01-01 01:08:182014-01-01 01:13:02Isestraße / Hoheluftbrücke140804Eppendorfer Weg/Hoheluftchaussee198086IVR50.555314

Last rows

df_indexbike_iduser_iddate_fromdate_untilstart_station_namestart_station_idend_station_nameend_station_idbooked_viaduration_in_mindistance_in_km
766994016228280108747B4E335F11DC342724FD0C5208AF2B0AC89D2B5812017-05-15 22:36:582017-05-15 22:46:49Alsenstraße/Düppelstraße211706Osterstraße/Bismarckstraße131642Terminal HH_95 (-2134-)101.463426
766994116228281109223E83B9222C38C025523BE16AB2F00530522EF93CC2017-05-15 22:38:442017-05-15 22:44:26Wiesendamm/Roggenkamp212607Schleidenstraße/Osterbekstraße208307iPhone SRH60.670447
7669942162282821160346F18F52C068612DC5AE3530A451FC80222F7B4C92017-05-15 22:45:132017-05-15 22:59:26Eimsbütteler Straße/Waterloostraße131644Lappenbergsallee / Bei der Apostelkirche243618Android CAB151.198705
7669943162282831175644C3C86C70B705E7075399845BFDF571548A50B902017-05-15 22:45:472017-05-15 23:00:21Neumühlen/Övelgönne213856Bahnhof Altona West / Busbahnhof131889Techniker HH_135 (-2151-)151.518474
766994416228288120311C639C55CFF8334C7C7983E278A9B257B044F00F12017-05-15 23:44:432017-05-16 00:13:44Goldbekplatz / Semperstraße140796Berliner Tor / Berlinertordamm131652Terminal HH_73 (-2363-)303.529680
766994516228289143621FF0963FE7D54E9455B5CF1ADE5DFEF484F8C525F2017-05-16 01:11:332017-05-16 01:32:16Schulterblatt/Eifflerstraße131648Fischersallee/Bleickenallee211711Unknown212.880246
766994616228290109115FF7147E7A3583564085352944933642F67C4D7552017-05-16 03:25:092017-05-16 03:31:05Königstraße / Struenseestraße131650Große Rainstraße/Ottenser Hauptstraße244943iPhone SRH60.989741
7669947162282911162555BB54A7EBCD7A5A88FD410A537E10160BA120BB22017-05-16 07:15:402017-05-16 07:19:49Heußweg/Wiesenstraße201326Lappenbergsallee / Bei der Apostelkirche243618Terminal HH_11 (-2225-)50.620216
7669948162282931196631024F6970D5BE146588D64F6AF427E147ADC642E2017-05-16 07:36:362017-05-16 07:44:16Bahnhof Altona Ost/Max-Brauer-Allee131646Neuer Pferdemarkt / Beim Grünen Jäger131890iPhone SRH81.990734
766994916228295120488CC6405146B51242A9169AB55E88A5C472EA1B2AA2017-05-16 07:40:172017-05-16 07:50:07Weidestraße/Biedermannplatz211922Mundsburg / Schürbeker Straße140799Techniker HH_119 (-2334-)101.241150